<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="../../aosa.css" type="text/css">
    <title>500 Lines or Less: Contingent: A Fully Dynamic Build System</title>
  </head>
  <body>

    <div class="titlebox">
      <h1>500 Lines or Less<br>Contingent: A Fully Dynamic Build System</h1>
      <p class="author">Brandon Rhodes and Daniel Rocco</p>
    </div>

    <p><em>Brandon Rhodes started using Python in the late 1990s, and for 17 years has maintained the PyEphem library for amateur astronomers. He works at Dropbox, has taught Python programming courses for corporate clients, consulted on projects like the New England Wildflower Society's “Go Botany” Django site, and will be the chair of the PyCon conference in 2016 and 2017. Brandon believes that well-written code is a form of literature, that beautifully formatted code is a work of graphic design, and that correct code is one of the most transparent forms of thought.</em></p>

<p><em>Daniel Rocco loves Python, coffee, craft, stout, object and system design, bourbon, teaching, trees, and Latin guitar. Thrilled that he gets to write Python for a living, he is always on the lookout for opportunities to learn from others in the community, and to contribute by sharing knowledge. He is a frequent speaker at PyAtl on introductory topics, testing, design, and shiny things; he loves seeing the spark of wonder and delight in people's eyes when someone shares a novel, surprising, or beautiful idea. Daniel lives in Atlanta with a microbiologist and four aspiring rocketeers.</em></p>

<h2 id="introduction">Introduction</h2>

<p>Build systems have long been a standard tool within computer programming.</p>

<p>The standard <code>make</code> build system, for which its author won the ACM Software System Award, was first developed in 1976. It not only lets you declare that an output file depends upon one (or more) inputs, but lets you do this recursively. A program, for example, might depend upon an object file which itself depends upon the corresponding source code:</p>

<pre><code>    prog: main.o
            cc -o prog main.o

    main.o: main.c
            cc -C -o main.o main.c</code></pre>

<p>Should <code>make</code> discover, upon its next invocation, that the <code>main.c</code> source code file now has a more recent modify time than <code>main.o</code>, then it will not only rebuild the <code>main.o</code> object file but will also rebuild <code>prog</code> itself.</p>

<p>Build systems are a common semester project assigned to undergraduate computer science students — not only because build systems are used in nearly all software projects, but because their construction involves fundamental data structures and algorithms involving directed graphs (which this chapter will later discuss in more detail).</p>

<p>With decades of use and practice behind build systems, one might expect them to have become completely general-purpose and ready for even the most extravagant demands. But, in fact, one kind of common interaction between build artifacts — the problem of dynamic cross-referencing — is handled so poorly by most build systems that in this chapter we are inspired to not only rehearse the classic solution and data structures used to solve the <code>make</code> problem, but to extend that solution dramatically, to a far more demanding domain.</p>

<p>The problem, again, is cross-referencing. Where do cross-references tend to emerge? In text documents, documentation, and printed books! </p>

<h2 id="the-problem-building-document-systems">The Problem: Building Document Systems</h2>

<p>Systems to rebuild formatted documents from source always seem to do too much work, or too little.</p>

<p>They do too much work when they respond to a minor edit by making you wait for unrelated chapters to be re-parsed and re-formatted. But they can also rebuild too little, leaving you with an inconsistent final product.</p>

<p>Consider <a href="http://sphinx-doc.org/">Sphinx</a>, the document builder that is used for both the official Python language documentation and many other projects in the Python community. A Sphinx project’s <code>index.rst</code> will usually include a table of contents:</p>

<pre><code>   Table of Contents
   =================

   .. toctree::

      install.rst
      tutorial.rst
      api.rst</code></pre>

<p>This list of chapter filenames tells Sphinx to include a link to each of the three named chapters when it builds the <code>index.html</code> output file. It will also include links to any sections within each chapter. Stripped of its markup, the text that results from the above title and <code>toctree</code> command might be:</p>

<pre><code>  Table of Contents

  • Installation

  • Newcomers Tutorial
      • Hello, World
      • Adding Logging

  • API Reference
      • Handy Functions
      • Obscure Classes</code></pre>

<p>This table of contents, as you can see, is a mash-up of information from four different files. While its basic order and structure come from <code>index.rst</code>, the actual titles of each chapter and section are pulled from the three chapter source files themselves.</p>

<p>If you later reconsider the tutorial’s chapter title — after all, the word “newcomer” sounds so quaint, as if your users are settlers who have just arrived in pioneer Wyoming — then you would edit the first line of <code>tutorial.rst</code> and write something better:</p>

<pre><code>  -Newcomers Tutorial
  +Beginners Tutorial
   ==================

   Welcome to the tutorial!
   This text will take you through the basics of...</code></pre>

<p>When you are ready to rebuild, Sphinx will do exactly the right thing! It will rebuild both the tutorial chapter itself, and the index. (Piping the output into <code>cat</code> makes Sphinx announce each rebuilt file on a separate line, instead of using bare carriage returns to repeatedly overwrite a single line with these progress updates.)</p>

<pre><code>   $ make html | cat
   writing output... [ 50%] index
   writing output... [100%] tutorial</code></pre>

<p>Because Sphinx chose to rebuild both documents, not only will <code>tutorial.html</code> now feature its new title up at the top, but the output <code>index.html</code> will display the updated chapter title in the table of contents. Sphinx has rebuilt everything so that the output is consistent.</p>

<p>What if your edit to <code>tutorial.rst</code> is more minor?</p>

<pre><code>   Beginners Tutorial
   ==================

  -Welcome to the tutorial!
  +Welcome to our project tutorial!
   This text will take you through the basics of...</code></pre>

<p>In this case there is no need to rebuild <code>index.html</code> because this minor edit to the interior of a paragraph does not change any of the information in the table of contents. But it turns out that Sphinx is not quite as clever as it might have at first appeared! It will go ahead and perform the redundant work of rebuilding <code>index.html</code> even though the resulting contents will be exactly the same.</p>

<pre><code>   writing output... [ 50%] index
   writing output... [100%] tutorial</code></pre>

<p>You can run <code>diff</code> on the “before” and “after” versions of <code>index.html</code> to confirm that your small edit has had no effect on the front page — yet Sphinx made you wait while it was rebuilt anyway.</p>

<p>You might not even notice the extra rebuild effort for small documents that are easy to compile. But the delay to your workflow can become significant when you are making frequent tweaks and edits to documents that are long, complex, or that involve the generation of multimedia like plots or animations. While Sphinx is at least making an effort not to rebuild every chapter when you make a single change — it has not, for example, rebuilt <code>install.html</code> or <code>api.html</code> in response to your <code>tutorial.rst</code> edit — it is doing more than is necessary.</p>

<p>But it turns out that Sphinx does something even worse: it sometimes does too little, leaving you with inconsistent output that could be noticed by users.</p>

<p>To see one of its simplest failures, first add a cross reference to the top of your API documentation:</p>

<pre><code>   API Reference
   =============

  +Before reading this, try reading our :doc:`tutorial`!
  +
   The sections below list every function
   and every single class and method offered...</code></pre>

<p>With its usual caution as regards the table of contents, Sphinx will dutifully rebuild both this API reference document as well as the <code>index.html</code> home page of your project:</p>

<pre><code>   writing output... [ 50%] api
   writing output... [100%] index</code></pre>

<p>In the <code>api.html</code> output file you can confirm that Sphinx has included the attractive human-readable title of the tutorial chapter into the cross reference’s anchor tag:</p>

<pre class="sourceCode html"><code class="sourceCode html">   <span class="kw">&lt;p&gt;</span>Before reading this, try reading our
   <span class="kw">&lt;a</span><span class="ot"> class=</span><span class="st">&quot;reference internal&quot;</span><span class="ot"> href=</span><span class="st">&quot;tutorial.html&quot;</span><span class="kw">&gt;</span>
     <span class="kw">&lt;em&gt;</span>Beginners Tutorial<span class="kw">&lt;/em&gt;</span>
   <span class="kw">&lt;/a&gt;</span>!<span class="kw">&lt;/p&gt;</span></code></pre>

<p>What if you now make another edit to the title at the top of the <code>tutorial.rst</code> file? You will have invalidated <em>three</em> output files:</p>

<ol style="list-style-type: decimal">
<li><p>The title at the top of <code>tutorial.html</code> is now out of date, so the file needs to be rebuilt.</p></li>
<li><p>The table of contents in <code>index.html</code> still has the old title, so that document needs to be rebuilt.</p></li>
<li><p>The embedded cross reference in the first paragraph of <code>api.html</code> still has the old chapter title, and also needs to be rebuilt.</p></li>
</ol>

<p>What does Sphinx do?</p>

<pre><code>   writing output... [ 50%] index
   writing output... [100%] tutorial</code></pre>

<p>Whoops.</p>

<p>Only two files were rebuilt, not three. Sphinx has failed to correctly rebuild your documentation.</p>

<p>If you now push your HTML to the web, users will see the old title in the cross reference at the top of <code>api.html</code> but then a different title — the new one — once the link has carried them to <code>tutorial.html</code> itself. This can happen for many kinds of cross reference that Sphinx supports: chapter titles, section titles, paragraphs, classes, methods, and functions.</p>

<h2 id="build-systems-and-consistency">Build Systems and Consistency</h2>

<p>The problem outlined above is not specific to Sphinx. Not only does it haunt other document systems, like LaTeX, but it can even plague projects that are simply trying to direct compilation steps with the venerable <code>make</code> utility, if their assets happen to cross-reference in interesting ways.</p>

<p>As the problem is ancient and universal, its solution is of equally long lineage:</p>

<pre class="sourceCode bash"><code class="sourceCode bash">   $ <span class="kw">rm</span> -r _build/
   $ <span class="kw">make</span> html</code></pre>

<p>If you remove all of the output, you are guaranteed a complete rebuild! Some projects even alias <code>rm</code> <code>-r</code> to a target named <code>clean</code> so that only a quick <code>make</code> <code>clean</code> is necessary to wipe the slate.</p>

<p>By eliminating every copy of every intermediate or output asset, a hefty <code>rm</code> <code>-r</code> is able to force the build to start over again with nothing cached — with no memory of its earlier state that could possibly lead to a stale product.</p>

<p>But could we develop a better approach?</p>

<p>What if your build system were a persistent process that noticed every chapter title, every section title, and every cross-referenced phrase as it passed from the source code of one document into the text of another? Its decisions about whether to rebuild other documents after a change to a single source file could be precise, instead of mere guesses, and correct, instead of leaving the output in an inconsistent state.</p>

<p>The result would be a system like the old static <code>make</code> tool, but which learned the dependencies between files as they were built — that added and removed dependencies dynamically as cross references were added, updated, and deleted.</p>

<p>In the sections that follow we will construct such a tool, named Contingent, in Python. Contingent guarantees correctness in the presence of dynamic dependencies while performing the fewest possible rebuild steps. While it can be applied to any problem domain, we will run it against a small version of the problem outlined above.</p>

<h2 id="linking-tasks-to-make-a-graph">Linking Tasks to Make a Graph</h2>

<p>Any build system needs a way to link inputs and outputs. The three markup texts in our discussion above, for example, each produce a corresponding HTML output file. The most natural way to express these relationships is as a collection of boxes and arrows — or, in mathematical terminology, <em>nodes</em> and <em>edges</em> — to form a <em>graph</em> (<a href="#figure-4.1">Figure 4.1</a>).</p>

<div class="center figure">
<a name="figure-4.1"></a><img src="contingent-images/figure1.png" alt="Figure 4.1 - Three files generated by parsing three input texts." title="Figure 4.1 - Three files generated by parsing three input texts." />
</div>

<p class="center figcaption">
<small>Figure 4.1 - Three files generated by parsing three input texts.</small>
</p>

<p>Each language in which a programmer might tackle writing a build system will offer various data structures with which such a graph of nodes and edges might be represented.</p>

<p>How could we represent such a graph in Python?</p>

<p>The Python language gives priority to four generic data structures by giving them direct support in the language syntax. You can create new instances of these big-four data structures by simply typing their literal representation into your source code, and their four type objects are available as built-in symbols that can be used without being imported.</p>

<p>The <strong>tuple</strong> is a read-only sequence used to hold heterogeneous data — each slot in a tuple typically means something different. Here, a tuple holds together a hostname and port number, and would lose its meaning if the elements were re-ordered:</p>

<pre class="sourceCode python"><code class="sourceCode python">(<span class="st">&#39;dropbox.com&#39;</span>, <span class="dv">443</span>)</code></pre>

<p>The <strong>list</strong> is a mutable sequence used to hold homogenous data — each item usually has the same structure and meaning as its peers. Lists can be used either to preserve data’s original input order, or can be rearranged or sorted to establish a new and more useful order.</p>

<pre class="sourceCode python"><code class="sourceCode python">[<span class="st">&#39;C&#39;</span>, <span class="st">&#39;Awk&#39;</span>, <span class="st">&#39;TCL&#39;</span>, <span class="st">&#39;Python&#39;</span>, <span class="st">&#39;JavaScript&#39;</span>]</code></pre>

<p>The <strong>set</strong> does not preserve order. Sets remember only whether a given value has been added, not how many times, and are therefore the go-to data structure for removing duplicates from a data stream. For example, the following two sets will each have three elements:</p>

<pre class="sourceCode python"><code class="sourceCode python">{<span class="dv">3</span>, <span class="dv">4</span>, <span class="dv">5</span>}
{<span class="dv">3</span>, <span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">4</span>, <span class="dv">4</span>, <span class="dv">3</span>, <span class="dv">5</span>, <span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">3</span>, <span class="dv">4</span>, <span class="dv">5</span>}</code></pre>

<p>The <strong>dict</strong> is an associative data structure for storing values accessible by a key. Dicts let the programmer chose the key by which each value is indexed, instead of using automatic integer indexing as the tuple and list do. The lookup is backed by a hash table, which means that dict key lookup runs at the same speed whether the dict has a dozen or a million keys.</p>

<pre class="sourceCode python"><code class="sourceCode python">{<span class="st">&#39;ssh&#39;</span>: <span class="dv">22</span>, <span class="st">&#39;telnet&#39;</span>: <span class="dv">23</span>, <span class="st">&#39;domain&#39;</span>: <span class="dv">53</span>, <span class="st">&#39;http&#39;</span>: <span class="dv">80</span>}</code></pre>

<p>A key to Python’s flexibility is that these four data structures are composable. The programmer can arbitrarily nest them inside each other to produce more complex data stores whose rules and syntax remain the simple ones of the underlying tuples, lists, sets, and dicts.</p>

<p>Given that each of our graph edges needs to know at least its origin node and its destination node, the simplest possible representation would be a tuple. The top edge in <a href="#figure-4.1">Figure 4.1</a> might look like:</p>

<pre class="sourceCode python"><code class="sourceCode python">    (<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>)</code></pre>

<p>How can we store several edges? While our initial impulse might be to simply throw all of our edge tuples into a list, that would have disadvantages. A list is careful to maintain order, but it is not meaningful to talk about an absolute order for the edges in a graph. And a list would be perfectly happy to hold several copies of exactly the same edge, even though we only want it to be possible to draw a single arrow between <code>tutorial.rst</code> and <code>tutorial.html</code>. The correct choice is thus the set, which would have us represent <a href="#figure-4.1">Figure 4.1</a> as:</p>

<pre class="sourceCode python"><code class="sourceCode python">    {(<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>),
     (<span class="st">&#39;index.rst&#39;</span>, <span class="st">&#39;index.html&#39;</span>),
     (<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api.html&#39;</span>)}</code></pre>

<p>This would allow quick iteration across all of our edges, fast insert and delete operations for a single edge, and a quick way to check whether a particular edge was present.</p>

<p>Unfortunately, those are not the only operations we need.</p>

<p>A build system like Contingent needs to understand the relationship between a given node and all the nodes connected to it. For example, when <code>api.rst</code> changes, Contingent needs to know which assets, if any, are affected by that change in order to minimize the work performed while also ensuring a complete build. To answer this question — “what nodes are downstream from <code>api.rst</code>?” — we need to examine the <em>outgoing</em> edges from <code>api.rst</code>.</p>

<p>But building the dependency graph requires that Contingent be concerned with a node's <em>inputs</em> as well. What inputs were used, for example, when the build system assembled the output document <code>tutorial.html</code>? It is by watching the input to each node that Contingent can know that <code>api.html</code> depends on <code>api.rst</code> but that <code>tutorial.html</code> does not. As sources change and rebuilds occur, Contingent rebuilds the incoming edges of each changed node to remove potentially stale edges and re-learn which resources a task uses this time around.</p>

<p>Our set-of-tuples does not make answering either of these questions easy. If we needed to know the relationship between <code>api.html</code> and the rest of the graph, we would need to traverse the entire set looking for edges that start or end at the <code>api.html</code> node.</p>

<p>An associative data structure like Python's dict would make these chores easier by allowing direct lookup of all the edges from a particular node:</p>

<pre class="sourceCode python"><code class="sourceCode python">    {<span class="st">&#39;tutorial.rst&#39;</span>: {(<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>)},
     <span class="co">&#39;tutorial.html&#39;</span>: {(<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>)},
     <span class="co">&#39;index.rst&#39;</span>: {(<span class="st">&#39;index.rst&#39;</span>, <span class="st">&#39;index.html&#39;</span>)},
     <span class="co">&#39;index.html&#39;</span>: {(<span class="st">&#39;index.rst&#39;</span>, <span class="st">&#39;index.html&#39;</span>)},
     <span class="co">&#39;api.rst&#39;</span>: {(<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api.html&#39;</span>)},
     <span class="co">&#39;api.html&#39;</span>: {(<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api.html&#39;</span>)}}</code></pre>

<p>Looking up the edges of a particular node would now be blazingly fast, at the cost of having to store every edge twice: once in a set of incoming edges, and once in a set of outgoing edges. But the edges in each set would have to be examined manually to see which are incoming and which are outgoing. It is also slightly redundant to keep naming the node over and over again in its set of edges.</p>

<p>The solution to both of these objections is to place incoming and outgoing edges in their own separate data structures, which will also absolve us of having to mention the node over and over again for every one of the edges in which it is involved.</p>

<pre class="sourceCode python"><code class="sourceCode python">    incoming = {
        <span class="st">&#39;tutorial.html&#39;</span>: {<span class="st">&#39;tutorial.rst&#39;</span>},
        <span class="co">&#39;index.html&#39;</span>: {<span class="st">&#39;index.rst&#39;</span>},
        <span class="co">&#39;api.html&#39;</span>: {<span class="st">&#39;api.rst&#39;</span>},
        }

    outgoing = {
        <span class="st">&#39;tutorial.rst&#39;</span>: {<span class="st">&#39;tutorial.html&#39;</span>},
        <span class="co">&#39;index.rst&#39;</span>: {<span class="st">&#39;index.html&#39;</span>},
        <span class="co">&#39;api.rst&#39;</span>: {<span class="st">&#39;api.html&#39;</span>},
        }</code></pre>

<p>Notice that <code>outgoing</code> represents, directly in Python syntax, exactly what we drew in <a href="#figure-4.1">Figure 4.1</a> earlier: the source documents on the left will be transformed by the build system into the output documents on the right. For this simple example each source points to only one output — all the output sets have only one element — but we will see examples shortly where a single input node has multiple downstream consequences.</p>

<p>Every edge in this dictionary-of-sets data structure does get represented twice, once as an outgoing edge from one node (<code>tutorial.rst</code> → <code>tutorial.html</code>) and again as an incoming edge to the other (<code>tutorial.html</code> ← <code>tutorial.rst</code>). These two representations capture precisely the same relationship, just from the opposite perspectives of the two nodes at either end of the edge. But in return for this redundancy, the data structure supports the fast lookup that Contingent needs.</p>

<h2 id="the-proper-use-of-classes">The Proper Use of Classes</h2>

<p>You may have been surprised by the absence of classes in the above discussion of Python data structures. After all, classes are a frequent mechanism for structuring applications and a hardly less-frequent subject of heated debate among their adherents and detractors. Classes were once thought important enough that entire educational curricula were designed around them, and the majority of popular programming languages include dedicated syntax for defining and using them.</p>

<p>But it turns out that classes are often orthogonal to the question of data structure design. Rather than offering us an entirely alternative data modeling paradigm, classes simply repeat data structures that we have already seen:</p>

<ul>
<li>A class instance is <em>implemented</em> as a dict.</li>
<li>A class instance is <em>used</em> like a mutable tuple.</li>
</ul>

<p>The class offers key lookup through a prettier syntax, where you get to say <code>graph.incoming</code> instead of <code>graph[&quot;incoming&quot;]</code>. But, in practice, class instances are almost never used as generic key-value stores. Instead, they are used to organize related but heterogeneous data by attribute name, with implementation details encapsulated behind a consistent and memorable interface.</p>

<p>So instead of putting a hostname and a port number together in a tuple and having to remember which came first and which came second, you create an <code>Address</code> class whose instances each have a <code>host</code> and a <code>port</code> attribute. You can then pass <code>Address</code> objects around where otherwise you would have had anonymous tuples. Code becomes easier to read and easier to write. But using a class instance does not really change any of the questions we faced above when doing data design; it just provides a prettier and less anonymous container.</p>

<p>The true value of classes, then, is not that they change the science of data design. The value of classes is that they let you <em>hide</em> your data design from the rest of a program!</p>

<p>Successful application design hinges upon our ability to exploit the powerful built-in data structures Python offers us while minimizing the volume of details we are required to keep in our heads at any one time. Classes provide the mechanism for resolving this apparent quandary: used effectively, a class provides a facade around some small subset of the system's overall design. When working within one subset — a <code>Graph</code>, for example — we can forget the implementation details of other subsets as long as we can remember their interfaces. In this way, programmers often find themselves navigating among several levels of abstraction in the course of writing a system, now working with the specific data model and implementation details for a particular subsystem, now connecting higher-level concepts through their interfaces.</p>

<p>For example, from the outside, code can simply ask for a new <code>Graph</code> instance:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="ch">from</span> contingent <span class="ch">import</span> graphlib
&gt;&gt;&gt; g = graphlib.Graph()</code></pre>

<p>without needing to understand the details of how <code>Graph</code> works. Code that is simply using the graph sees only interface verbs — the method calls — when manipulating a graph, as when an edge is added or some other operation performed:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; g.add_edge(<span class="st">&#39;index.rst&#39;</span>, <span class="st">&#39;index.html&#39;</span>)
&gt;&gt;&gt; g.add_edge(<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>)
&gt;&gt;&gt; g.add_edge(<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api.html&#39;</span>)</code></pre>

<p>Careful readers will have noticed that we added edges to our graph without explicitly creating “node” and “edge” objects, and that the nodes themselves in these early examples are simply strings. Coming from other languages and traditions, one might have expected to see user-defined classes and interfaces for everything in the system:</p>

<pre class="sourceCode java"><code class="sourceCode java">    Graph g = <span class="kw">new</span> <span class="fu">ConcreteGraph</span>();
    Node indexRstNode = <span class="kw">new</span> <span class="fu">StringNode</span>(<span class="st">&quot;index.rst&quot;</span>);
    Node indexHtmlNode = <span class="kw">new</span> <span class="fu">StringNode</span>(<span class="st">&quot;index.html&quot;</span>);
    Edge indexEdge = <span class="kw">new</span> <span class="fu">DirectedEdge</span>(indexRstNode, indexHtmlNode);
    g.<span class="fu">addEdge</span>(indexEdge);</code></pre>

<p>The Python language and community explicitly and intentionally emphasize using simple, generic data structures to solve problems, instead of creating custom classes for every minute detail of the problem we want to tackle. This is one facet of the notion of “Pythonic” solutions: Pythonic solutions try to minimize syntactic overhead and leverage Python's powerful built-in tools and extensive standard library.</p>

<p>With these considerations in mind, let’s return to the <code>Graph</code> class, examining its design and implementation to see the interplay between data structures and class interfaces. When a new <code>Graph</code> instance is constructed, a pair of dictionaries has already been built to store edges using the logic we outlined in the previous section:</p>

<pre class="sourceCode python"><code class="sourceCode python"><span class="kw">class</span> Graph:
    <span class="co">&quot;&quot;&quot;A directed graph of the relationships among build tasks.&quot;&quot;&quot;</span>

    <span class="kw">def</span> <span class="ot">__init__</span>(<span class="ot">self</span>):
        <span class="ot">self</span>._inputs_of = defaultdict(<span class="dt">set</span>)
        <span class="ot">self</span>._consequences_of = defaultdict(<span class="dt">set</span>)</code></pre>

<p>The leading underscore in front of the attribute names <code>_inputs_of</code> and <code>_consequences_of</code> is a common convention in the Python community to signal that an attribute is private. This convention is one way the community suggests that programmers pass messages and warnings through space and time to each other. Recognizing the need to signal differences between public and internal object attributes, the community adopted the single leading underscore as a concise and fairly consistent indicator to other programmers, including our future selves, that the attribute is best treated as part of the invisible internal machinery of the class.</p>

<p>Why are we using a <code>defaultdict</code> instead of a standard dict? A common problem when composing dicts with other data structures is handling missing keys. With a normal dict, retrieving a key that does not exist raises a <code>KeyError</code>:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; consequences_of = {}
&gt;&gt;&gt; consequences_of[<span class="st">&#39;index.rst&#39;</span>].add(<span class="st">&#39;index.html&#39;</span>)
Traceback (most recent call last):
     ...
<span class="ot">KeyError</span>: <span class="st">&#39;index.rst&#39;</span></code></pre>

<p>Using a normal dict requires special checks throughout the code to handle this specific case, for example when adding a new edge:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="co"># Special case to handle “we have not seen this task yet”:</span>

    <span class="kw">if</span> input_task not in <span class="ot">self</span>._consequences_of:
        <span class="ot">self</span>._consequences_of[input_task] = <span class="dt">set</span>()

    <span class="ot">self</span>._consequences_of[input_task].add(consequence_task)</code></pre>

<p>This need is so common that Python includes a special utility, the <code>defaultdict</code>, which lets you provide a function that returns a value for absent keys. When we ask about an edge that the <code>Graph</code> hasn't yet seen, we will get back an empty <code>set</code> instead of an exception:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="ch">from</span> collections <span class="ch">import</span> defaultdict
&gt;&gt;&gt; consequences_of = defaultdict(<span class="dt">set</span>)
&gt;&gt;&gt; consequences_of[<span class="st">&#39;api.rst&#39;</span>]
<span class="dt">set</span>()</code></pre>

<p>Structuring our implementation this way means that each key’s first use can look identical to second and subsequent times that a particular key is used:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; consequences_of[<span class="st">&#39;index.rst&#39;</span>].add(<span class="st">&#39;index.html&#39;</span>)
&gt;&gt;&gt; <span class="st">&#39;index.html&#39;</span> in consequences_of[<span class="st">&#39;index.rst&#39;</span>]
<span class="ot">True</span></code></pre>

<p>Given these techniques, let’s examine the implementation of <code>add_edge</code>, which we earlier used to build the graph for <a href="#figure-4.1">Figure 4.1</a>.</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="kw">def</span> add_edge(<span class="ot">self</span>, input_task, consequence_task):
        <span class="co">&quot;&quot;&quot;Add an edge: `consequence_task` uses the output of `input_task`.&quot;&quot;&quot;</span>
        <span class="ot">self</span>._consequences_of[input_task].add(consequence_task)
        <span class="ot">self</span>._inputs_of[consequence_task].add(input_task)</code></pre>

<p>This method hides the fact that two, not one, storage steps are required for each new edge so that we know about it in both directions. And notice how <code>add_edge()</code> does not know or care whether either node has been seen before. Because the inputs and consequences data structures are each a <code>defaultdict(set)</code>, the <code>add_edge()</code> method remains blissfully ignorant as to the novelty of a node — the <code>defaultdict</code> takes care of the difference by creating a new <code>set</code> object on the fly. As we saw above, <code>add_edge()</code> would be three times longer had we not used <code>defaultdict</code>. More importantly, it would be more difficult to understand and reason about the resulting code. This implementation demonstrates a Pythonic approach to problems: simple, direct, and concise.</p>

<p>Callers should also be given a simple way to visit every edge without having to learn how to traverse our data structure:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="kw">def</span> edges(<span class="ot">self</span>):
        <span class="co">&quot;&quot;&quot;Return all edges as ``(input_task, consequence_task)`` tuples.&quot;&quot;&quot;</span>
        <span class="kw">return</span> [(a, b) <span class="kw">for</span> a in <span class="ot">self</span>.<span class="dt">sorted</span>(<span class="ot">self</span>._consequences_of)
                       <span class="kw">for</span> b in <span class="ot">self</span>.<span class="dt">sorted</span>(<span class="ot">self</span>._consequences_of[a])]</code></pre>

<p>The <code>Graph.sorted()</code> method makes an attempt to sort the nodes in a natural sort order (such as alphabetical) that can provide a stable output order for the user.</p>

<p>By using this traversal method we can see that, following our three “add” method calls earlier, <code>g</code> now represents the same graph that we saw in <a href="#figure-4.1">Figure 4.1</a>.</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="ch">from</span> pprint <span class="ch">import</span> pprint
&gt;&gt;&gt; pprint(g.edges())
[(<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api.html&#39;</span>),
 (<span class="st">&#39;index.rst&#39;</span>, <span class="st">&#39;index.html&#39;</span>),
 (<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial.html&#39;</span>)]</code></pre>

<p>Since we now have a real live Python object, and not just a figure, we can ask it interesting questions! For example, when Contingent is building a blog from source files, it will need to know things like “What depends on <code>api.rst</code>?” when the content of <code>api.rst</code> changes:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; g.immediate_consequences_of(<span class="st">&#39;api.rst&#39;</span>)
[<span class="st">&#39;api.html&#39;</span>]</code></pre>

<p>This <code>Graph</code> is telling Contingent that, when <code>api.rst</code> changes, <code>api.html</code> is now stale and must be rebuilt.</p>

<p>How about <code>index.html</code>?</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; g.immediate_consequences_of(<span class="st">&#39;index.html&#39;</span>)
[]</code></pre>

<p>An empty list has been returned, signalling that <code>index.html</code> is at the right edge of the graph and so nothing further needs to be rebuilt if it changes. This query can be expressed very simply thanks to the work that has already gone in to laying out our data:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="kw">def</span> immediate_consequences_of(<span class="ot">self</span>, task):
        <span class="co">&quot;&quot;&quot;Return the tasks that use `task` as an input.&quot;&quot;&quot;</span>
        <span class="kw">return</span> <span class="ot">self</span>.<span class="dt">sorted</span>(<span class="ot">self</span>._consequences_of[task])</code></pre>

<pre class="sourceCode python"><code class="sourceCode python"> &gt;&gt;&gt; <span class="ch">from</span> contingent.rendering <span class="ch">import</span> as_graphviz
 &gt;&gt;&gt; <span class="dt">open</span>(<span class="st">&#39;figure1.dot&#39;</span>, <span class="st">&#39;w&#39;</span>).write(as_graphviz(g)) and <span class="ot">None</span></code></pre>

<p><a href="#figure-4.1">Figure 4.1</a> ignored one of the most important relationships that we discovered in the opening section of our chapter: the way that document titles appear in the table of contents. Let’s fill in this detail. We will create a node for each title string that needs to be generated by parsing an input file and then passed to one of our other routines:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; g.add_edge(<span class="st">&#39;api.rst&#39;</span>, <span class="st">&#39;api-title&#39;</span>)
&gt;&gt;&gt; g.add_edge(<span class="st">&#39;api-title&#39;</span>, <span class="st">&#39;index.html&#39;</span>)
&gt;&gt;&gt; g.add_edge(<span class="st">&#39;tutorial.rst&#39;</span>, <span class="st">&#39;tutorial-title&#39;</span>)
&gt;&gt;&gt; g.add_edge(<span class="st">&#39;tutorial-title&#39;</span>, <span class="st">&#39;index.html&#39;</span>)</code></pre>

<p>The result is a graph (<a href="#figure-4.2">Figure 4.2</a>) that could properly handle rebuilding the table of contents that we discussed in the opening of this chapter.</p>

<div class="center figure">
<a name="figure-4.2"></a><img src="contingent-images/figure2.png" alt="Figure 4.2 - Being prepared to rebuild `index.html` whenever any title that it mentions gets changed." title="Figure 4.2 - Being prepared to rebuild `index.html` whenever any title that it mentions gets changed." />
</div>

<p class="center figcaption">
<small>Figure 4.2 - Being prepared to rebuild <code>index.html</code> whenever any title that it mentions gets changed.</small>
</p>

<p>This manual walk-through illustrates what we will eventually have Contingent do for us: the graph <code>g</code> captures the inputs and consequences for the various artifacts in our project's documentation.</p>

<h2 id="learning-connections">Learning Connections</h2>

<p>We now have a way for Contingent to keep track of tasks and the relationships between them. If we look more closely at <a href="#figure-4.2">Figure 4.2</a>, however, we see that it is actually a little hand-wavy and vague: <em>how</em> is <code>api.html</code> produced from <code>api.rst</code>? How do we know that <code>index.html</code> needs the title from the tutorial? And how is this dependency resolved?</p>

<p>Our intuitive notion of these ideas served when we were constructing consequences graphs by hand, but unfortunately computers are not terribly intuitive, so we will need to be more precise about what we want.</p>

<p>What are the steps required to produce output from sources? How are these steps defined and executed? And how can Contingent know the connections between them?</p>

<p>In Contingent, build tasks are modeled as functions plus arguments. The functions define actions that a particular project understands how to perform. The arguments provide the specifics: <em>which</em> source document should be read, <em>which</em> blog title is needed. As they are running, these functions may in turn invoke <em>other</em> task functions, passing whatever arguments they need answers for.</p>

<p>To see how this works, we will actually now implement the documentation builder described at the beginning of the chapter. In order to prevent ourselves from wallowing around in a bog of details, for this illustration we will work with simplified input and output document formats. Our input documents will consist of a title on the first line, with the remainder of the text forming the body. Cross references will simply be source file names enclosed in backticks, which on output are replaced with the title from the corresponding document in the output.</p>

<p>Here is the content of our example <code>index.txt</code>, <code>api.txt</code>, and <code>tutorial.txt</code>, illustrating titles, document bodies, and cross-references from our little document format:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; index = <span class="st">&quot;&quot;&quot;</span>
<span class="st">... Table of Contents</span>
<span class="st">... -----------------</span>
<span class="st">... * `tutorial.txt`</span>
<span class="st">... * `api.txt`</span>
<span class="st">... &quot;&quot;&quot;</span>

&gt;&gt;&gt; tutorial = <span class="st">&quot;&quot;&quot;</span>
<span class="st">... Beginners Tutorial</span>
<span class="st">... ------------------</span>
<span class="st">... Welcome to the tutorial!</span>
<span class="st">... We hope you enjoy it.</span>
<span class="st">... &quot;&quot;&quot;</span>

&gt;&gt;&gt; api = <span class="st">&quot;&quot;&quot;</span>
<span class="st">... API Reference</span>
<span class="st">... -------------</span>
<span class="st">... You might want to read</span>
<span class="st">... the `tutorial.txt` first.</span>
<span class="st">... &quot;&quot;&quot;</span></code></pre>

<p>Now that we have some source material to work with, what functions would a Contingent-based blog builder need?</p>

<p>In the simple examples above, the HTML output files proceed directly from the source, but in a realistic system, turning source into markup involves several steps: reading the raw text from disk, parsing the text to a convenient internal representation, processing any directives the author may have specified, resolving cross-references or other external dependencies (such as include files), and applying one or more view transformations to convert the internal representation to its output form.</p>

<p>Contingent manages tasks by grouping them into a <code>Project</code>, a sort of build system busybody that injects itself into the middle of the build process, noting every time one task talks to another to construct a graph of the relationships between all the tasks.</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="ch">from</span> contingent.projectlib <span class="ch">import</span> Project, Task
&gt;&gt;&gt; project = Project()
&gt;&gt;&gt; task = project.task</code></pre>

<p>A build system for the example given at the beginning of the chapter might involve a few tasks.</p>

<p>Our <code>read()</code> task will pretend to read the files from disk. Since we really defined the source text in variables, all it needs to do is convert from a filename to the corresponding text.</p>

<pre class="sourceCode python"><code class="sourceCode python">  &gt;&gt;&gt; filesystem = {<span class="st">&#39;index.txt&#39;</span>: index,
  ...               <span class="st">&#39;tutorial.txt&#39;</span>: tutorial,
  ...               <span class="st">&#39;api.txt&#39;</span>: api}
  ...
  &gt;&gt;&gt; @task
  ... <span class="kw">def</span> read(filename):
  ...     <span class="kw">return</span> filesystem[filename]</code></pre>

<p>The <code>parse()</code> task interprets the raw text of the file contents according to the specification of our document format. Our format is very simple: the title of the document appears on the first line, and the rest of the content is considered the document's body.</p>

<pre class="sourceCode python"><code class="sourceCode python">  &gt;&gt;&gt; @task
  ... <span class="kw">def</span> parse(filename):
  ...     lines = read(filename).strip().splitlines()
  ...     title = lines[<span class="dv">0</span>]
  ...     body = <span class="st">&#39;</span><span class="ch">\n</span><span class="st">&#39;</span>.join(lines[<span class="dv">2</span>:])
  ...     <span class="kw">return</span> title, body</code></pre>

<p>Because the format is so simple, the parser is a little silly, but it illustrates the interpretive responsibilities that parsers are required to carry out. (Parsing in general is a very interesting subject and many books have been written either partially or completely about it.) In a system like Sphinx, the parser must understand the many markup tokens, directives, and commands defined by the system, transforming the input text into something the rest of the system can work with.</p>

<p>Notice the connection point between <code>parse()</code> and <code>read()</code> — the first task in parsing is to pass the filename it has been given to <code>read()</code>, which finds and returns the contents of that file.</p>

<p>The <code>title_of()</code> task, given a source file name, returns the document's title:</p>

<pre class="sourceCode python"><code class="sourceCode python">  &gt;&gt;&gt; @task
  ... <span class="kw">def</span> title_of(filename):
  ...     title, body = parse(filename)
  ...     <span class="kw">return</span> title</code></pre>

<p>This task nicely illustrates the separation of responsibilities between the parts of a document processing system. The <code>title_of()</code> function works directly from an in-memory representation of a document — in this case, a tuple — instead of taking it upon itself to re-parse the entire document again just to find the title. The <code>parse()</code> function alone produces the in-memory representation, in accordance with the contract of the system specification, and the rest of the blog builder processing functions like <code>title_of()</code> simply use its output as their authority.</p>

<p>If you are coming from an orthodox object-oriented tradition, this function-oriented design may look a little weird. In an OO solution, <code>parse()</code> would return some sort of <code>Document</code> object that has <code>title_of()</code> as a method or property. In fact, Sphinx works exactly this way: its <code>Parser</code> subsystem produces a “Docutils document tree” object for the other parts of the system to use.</p>

<p>Contingent is not opinionated with regard to these differing design paradigms and supports either approach equally well. For this chapter we are keeping things simple.</p>

<p>The final task, <code>render()</code>, turns the in-memory representation of a document into an output form. It is, in effect, the inverse of <code>parse()</code>. Whereas <code>parse()</code> takes an input document conforming to a specification and converts it to an in-memory representation, <code>render()</code> takes an in-memory representation and produces an output document conforming to some specification.</p>

<pre class="sourceCode python"><code class="sourceCode python">  &gt;&gt;&gt; <span class="ch">import</span> re
  &gt;&gt;&gt;
  &gt;&gt;&gt; LINK = <span class="st">&#39;&lt;a href=&quot;{}&quot;&gt;{}&lt;/a&gt;&#39;</span>
  &gt;&gt;&gt; PAGE = <span class="st">&#39;&lt;h1&gt;{}&lt;/h1&gt;</span><span class="ch">\n</span><span class="st">&lt;p&gt;</span><span class="ch">\n</span><span class="st">{}</span><span class="ch">\n</span><span class="st">&lt;p&gt;&#39;</span>
  &gt;&gt;&gt;
  &gt;&gt;&gt; <span class="kw">def</span> make_link(match):
  ...     filename = match.group(<span class="dv">1</span>)
  ...     <span class="kw">return</span> LINK.<span class="dt">format</span>(filename, title_of(filename))
  ...
  &gt;&gt;&gt; @task
  ... <span class="kw">def</span> render(filename):
  ...     title, body = parse(filename)
  ...     body = re.sub(<span class="st">r&#39;`([^`]+)`&#39;</span>, make_link, body)
  ...     <span class="kw">return</span> PAGE.<span class="dt">format</span>(title, body)</code></pre>

<p>Here is an example run that will invoke every stage of the above logic — rendering <code>tutorial.txt</code> to produce its output:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="dt">print</span>(render(<span class="st">&#39;tutorial.txt&#39;</span>))
&lt;h1&gt;Beginners Tutorial&lt;/h1&gt;
&lt;p&gt;
Welcome to the tutorial!
We hope you enjoy it.
&lt;p&gt;</code></pre>

<p><a href="#figure-4.3">Figure 4.3</a> illustrates the task graph that transitively connects all the tasks required to produce the output, from reading the input file, to parsing and transforming the document, and rendering it:</p>

<div class="center figure">
<a name="figure-4.3"></a><img src="contingent-images/figure3.png" alt="Figure 4.3 - A task graph." title="Figure 4.3 - A task graph." />
</div>

<p class="center figcaption">
<small>Figure 4.3 - A task graph.</small>
</p>

<p>It turns out that <a href="#figure-4.3">Figure 4.3</a> was not hand-drawn for this chapter, but has been generated directly from Contingent! Building this graph is possible for the <code>Project</code> object because it maintains its own call stack, similar to the stack of live execution frames that Python maintains to remember which function to continue running when the current one returns.</p>

<p>Every time a new task is invoked, Contingent can assume that it has been called — and that its output will be used — by the task currently at the top of the stack. Maintaining the stack will require that several extra steps surround the invocation of a task <em>T</em>:</p>

<ol style="list-style-type: decimal">
<li>Push <em>T</em> onto the stack.</li>
<li>Execute <em>T</em>, letting it call any other tasks it needs.</li>
<li>Pop <em>T</em> off the stack.</li>
<li>Return its result.</li>
</ol>

<p>To intercept task calls, the <code>Project</code> leverages a key Python feature: <em>function decorators</em>. A decorator is allowed to process or transform a function at the moment that it is being defined. The <code>Project.task</code> decorator uses this opportunity to package every task inside another function, a <em>wrapper</em>, which allows a clean separation of responsibilities between the wrapper — which will worry about graph and stack management on behalf of the Project — and our task functions that focus on document processing. Here is what the <code>task</code> decorator boilerplate looks like:</p>

<pre class="sourceCode python"><code class="sourceCode python">        <span class="ch">from</span> functools <span class="ch">import</span> wraps

        <span class="kw">def</span> task(function):
            <span class="ot">@wraps</span>(function)
            <span class="kw">def</span> wrapper(*args):
                <span class="co"># wrapper body, that will call function()</span>
            <span class="kw">return</span> wrapper</code></pre>

<p>This is an entirely typical Python decorator declaration. It can then be applied to a function by naming it after an <code>@</code> character atop the <code>def</code> that creates the function:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="ot">@task</span>
    <span class="kw">def</span> title_of(filename):
        title, body = parse(filename)
        <span class="kw">return</span> title</code></pre>

<p>When this definition is complete, the name <code>title_of</code> will refer to the wrapped version of the function. The wrapper can access the original version of the function via the name <code>function</code>, calling it at the appropriate time. The body of the Contingent wrapper runs something like this:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="kw">def</span> task(function):
        <span class="ot">@wraps</span>(function)
        <span class="kw">def</span> wrapper(*args):
            task = Task(wrapper, args)
            <span class="kw">if</span> <span class="ot">self</span>.task_stack:
                <span class="ot">self</span>._graph.add_edge(task, <span class="ot">self</span>.task_stack[-<span class="dv">1</span>])
            <span class="ot">self</span>._graph.clear_inputs_of(task)
            <span class="ot">self</span>._task_stack.append(task)
            <span class="kw">try</span>:
                value = function(*args)
            <span class="kw">finally</span>:
                <span class="ot">self</span>._task_stack.pop()

            <span class="kw">return</span> value
        <span class="kw">return</span> wrapper</code></pre>

<p>This wrapper performs several crucial maintenance steps:</p>

<ol style="list-style-type: decimal">
<li><p>Packages the task — a function plus its arguments — into a small object for convenience. The <code>wrapper</code> here names the wrapped version of the task function.</p></li>
<li><p>If this task has been invoked by a current task that is already underway, add an edge capturing the fact that this task is an input to the already-running task.</p></li>
<li><p>Forget whatever we might have learned last time about the task, since it might make new decisions this time — if the source text of the API guide no longer mentions the Tutorial, for example, then its <code>render()</code> will no longer ask for the <code>title_of()</code> the Tutorial document.</p></li>
<li><p>Push this task onto the top of the task stack in case it decides, in its turn, to invoke further tasks in the course of doing its work.</p></li>
<li><p>Invoke the task inside of a <code>try...finally</code> block that ensures we correctly remove the finished task from the stack, even if it dies by raising an exception.</p></li>
<li><p>Return the task’s return value, so that callers of this wrapper will not be able to tell that they have not simply invoked the plain task function itself.</p></li>
</ol>

<p>Steps 4 and 5 maintain the task stack itself, which is then used by step 2 to perform the consequences tracking that is our whole reason for building a task stack in the first place.</p>

<p>Since each task gets surrounded by its own copy of the wrapper function, the mere invocation and execution of the normal stack of tasks will produce a graph of relationships as an invisible side effect. That is why we were careful to use the wrapper around each processing step that we defined:</p>

<pre class="sourceCode python"><code class="sourceCode python">    <span class="ot">@task</span>
    <span class="kw">def</span> read(filename):
        <span class="co"># body of read</span>

    <span class="ot">@task</span>
    <span class="kw">def</span> parse(filename):
        <span class="co"># body of parse</span>

    <span class="ot">@task</span>
    <span class="kw">def</span> title_of(filename):
        <span class="co"># body of title_of</span>

    <span class="ot">@task</span>
    <span class="kw">def</span> render(filename):
        <span class="co"># body of render</span></code></pre>

<p>Thanks to these wrappers, when we called <code>parse('tutorial.txt')</code> the decorator learned the connection between <code>parse</code> and <code>read</code>. We can ask about the relationship by building another <code>Task</code> tuple and asking what the consequences would be if its output value changed:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; task = Task(read, (<span class="st">&#39;tutorial.txt&#39;</span>,))
&gt;&gt;&gt; <span class="dt">print</span>(task)
read(<span class="st">&#39;tutorial.txt&#39;</span>)
&gt;&gt;&gt; project._graph.immediate_consequences_of(task)
[parse(<span class="st">&#39;tutorial.txt&#39;</span>)]</code></pre>

<p>The consequence of re-reading the <code>tutorial.txt</code> file and finding that its contents have changed is that we need to re-execute the <code>parse()</code> routine for that document. What happens if we render the entire set of documents? Will Contingent be able to learn the entire build process?</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="kw">for</span> filename in <span class="st">&#39;index.txt&#39;</span>, <span class="st">&#39;tutorial.txt&#39;</span>, <span class="st">&#39;api.txt&#39;</span>:
...     <span class="dt">print</span>(render(filename))
...     <span class="dt">print</span>(<span class="st">&#39;=&#39;</span> * <span class="dv">30</span>)
...
&lt;h1&gt;Table of Contents&lt;/h1&gt;
&lt;p&gt;
* &lt;a href=<span class="st">&quot;tutorial.txt&quot;</span>&gt;Beginners Tutorial&lt;/a&gt;
* &lt;a href=<span class="st">&quot;api.txt&quot;</span>&gt;API Reference&lt;/a&gt;
&lt;p&gt;
==============================
&lt;h1&gt;Beginners Tutorial&lt;/h1&gt;
&lt;p&gt;
Welcome to the tutorial!
We hope you enjoy it.
&lt;p&gt;
==============================
&lt;h1&gt;API Reference&lt;/h1&gt;
&lt;p&gt;
You might want to read
the &lt;a href=<span class="st">&quot;tutorial.txt&quot;</span>&gt;Beginners Tutorial&lt;/a&gt; first.
&lt;p&gt;
==============================</code></pre>

<p>It worked! From the output, we can see that our transform substituted the document titles for the directives in our source documents, indicating that Contingent was able to discover the connections between the various tasks needed to build our documents.</p>

<div class="center figure">
<a name="figure-4.4"></a><img src="contingent-images/figure4.png" alt="Figure 4.4 - The complete set of relationships between our input files and our HTML outputs." title="Figure 4.4 - The complete set of relationships between our input files and our HTML outputs." />
</div>

<p class="center figcaption">
<small>Figure 4.4 - The complete set of relationships between our input files and our HTML outputs.</small>
</p>

<p>By watching one task invoke another through the <code>task</code> wrapper machinery, <code>Project</code> has automatically learned the graph of inputs and consequences. Since it has a complete consequences graph at its disposal, Contingent knows all the things to rebuild if the inputs to any tasks change.</p>

<h2 id="chasing-consequences">Chasing Consequences</h2>

<p>Once the initial build has run to completion, Contingent needs to monitor the input files for changes. When the user finishes a new edit and runs “Save”, both the <code>read()</code> method and its consequences need to be invoked.</p>

<p>This will require us to walk the graph in the opposite order from the one in which it was created. It was built, you will recall, by calling <code>render()</code> for the API Reference and having that call <code>parse()</code> which finally invoked the <code>read()</code> task. Now we go in the other direction: we know that <code>read()</code> will now return new content, and we need to figure out what consequences lie downstream.</p>

<p>The process of compiling consequences is a recursive one, as each consequence can itself have further tasks that depended on it. We could perform this recursion manually through repeated calls to the graph. (Note that we are here taking advantage of the fact that the Python prompt saves the last value displayed under the name <code>_</code> for use in the subsequent expression.)</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; task = Task(read, (<span class="st">&#39;api.txt&#39;</span>,))
&gt;&gt;&gt; project._graph.immediate_consequences_of(task)
[parse(<span class="st">&#39;api.txt&#39;</span>)]
&gt;&gt;&gt; t1, = _
&gt;&gt;&gt; project._graph.immediate_consequences_of(t1)
[render(<span class="st">&#39;api.txt&#39;</span>), title_of(<span class="st">&#39;api.txt&#39;</span>)]
&gt;&gt;&gt; t2, t3 = _
&gt;&gt;&gt; project._graph.immediate_consequences_of(t2)
[]
&gt;&gt;&gt; project._graph.immediate_consequences_of(t3)
[render(<span class="st">&#39;index.txt&#39;</span>)]
&gt;&gt;&gt; t4, = _
&gt;&gt;&gt; project._graph.immediate_consequences_of(t4)
[]</code></pre>

<p>This recursive task of looking repeatedly for immediate consequences and only stopping when we arrive at tasks with no further consequences is a basic enough graph operation that it is supported directly by a method on the <code>Graph</code> class:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="co"># Secretly adjust pprint to a narrower-than-usual width:</span>
&gt;&gt;&gt; _pprint = pprint
&gt;&gt;&gt; pprint = <span class="kw">lambda</span> x: _pprint(x, width=<span class="dv">40</span>)
&gt;&gt;&gt; pprint(project._graph.recursive_consequences_of([task]))
[parse(<span class="st">&#39;api.txt&#39;</span>),
 render(<span class="st">&#39;api.txt&#39;</span>),
 title_of(<span class="st">&#39;api.txt&#39;</span>),
 render(<span class="st">&#39;index.txt&#39;</span>)]</code></pre>

<p>In fact, <code>recursive_consequences_of()</code> tries to be a bit clever. If a particular task appears repeatedly as a downstream consequence of several other tasks, then it is careful to only mention it once in the output list, and to move it close to the end so that it appears only after the tasks that are its inputs. This intelligence is powered by the classic depth-first implementation of a topological sort, an algorithm which winds up being fairly easy to write in Python through a hidden recursive helper function. Check out the <code>graphlib.py</code> source code for the details.</p>

<p>If, upon detecting a change, we are careful to re-run every task in the recursive consequences, then Contingent will be able to avoid rebuilding too little. Our second challenge, however, was to avoid rebuilding too much. Refer again to <a href="#figure-4.4">Figure 4.4</a>. We want to avoid rebuilding all three documents every time that <code>tutorial.txt</code> is changed, since most edits will probably not affect its title but only its body. How can this be accomplished?</p>

<p>The solution is to make graph recomputation dependent on caching. When stepping forward through the recursive consequences of a change, we will only invoke tasks whose inputs are different than last time.</p>

<p>This optimization will involve a final data structure. We will give the <code>Project</code> a <code>_todo</code> set with which to remember every task for which at least one input value has changed, and which therefore requires re-execution. Because only tasks in <code>_todo</code> are out-of-date, the build process can skip running any tasks unless they appear there.</p>

<p>Again, Python’s convenient and unified design makes these features very easy to code. Because task objects are hashable, <code>_todo</code> can simply be a set that remembers task items by identity — guaranteeing that a task never appears twice — and the <code>_cache</code> of return values from previous runs can be a dict with tasks as keys.</p>

<p>More precisely, the rebuild step must keep looping as long as <code>_todo</code> is non-empty. During each loop, it should:</p>

<ul>
<li><p>Call <code>recursive_consequences_of()</code> and pass in every task listed in <code>_todo</code>. The return value will be a list of not only the <code>_todo</code> tasks themselves, but also every task downstream of them — every task, in other words, that could possibly need re-execution if the outputs come out different this time.</p></li>
<li><p>For each task in the list, check whether it is listed in <code>_todo</code>. If not, then we can skip running it, because none of the tasks that we have re-invoked upstream of it has produced a new return value that would require the task’s recomputation.</p></li>
<li><p>But for any task that is indeed listed in <code>_todo</code> by the time we reach it, we need to ask it to re-run and re-compute its return value. If the task wrapper function detects that this return value does not match the old cached value, then its downstream tasks will be automatically added to <code>_todo</code> before we reach them in the list of recursive consequences.</p></li>
</ul>

<p>By the time we reach the end of the list, every task that could possibly need to be re-run should in fact have been re-run. But just in case, we will check <code>_todo</code> and try again if it is not yet empty. Even for very rapidly changing dependency trees, this should quickly settle out. Only a cycle — where, for example, task <em>A</em> needs the output of task <em>B</em> which itself needs the output of task <em>A</em> — could keep the builder in an infinite loop, and only if their return values never stabilize. Fortunately, real-world build tasks are typically without cycles.</p>

<p>Let us trace the behavior of this system through an example.</p>

<p>Suppose you edit <code>tutorial.txt</code> and change both the title and the body content. We can simulate this by modifying the value in our <code>filesystem</code> dict:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; filesystem[<span class="st">&#39;tutorial.txt&#39;</span>] = <span class="st">&quot;&quot;&quot;</span>
<span class="st">... The Coder Tutorial</span>
<span class="st">... ------------------</span>
<span class="st">... This is a new and improved</span>
<span class="st">... introductory paragraph.</span>
<span class="st">... &quot;&quot;&quot;</span></code></pre>

<p>Now that the contents have changed, we can ask the Project to re-run the <code>read()</code> task by using its <code>cache_off()</code> context manager that temporarily disables its willingness to return its old cached result for a given task and argument:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; <span class="kw">with</span> project.cache_off():
...     text = read(<span class="st">&#39;tutorial.txt&#39;</span>)</code></pre>

<p>The new tutorial text has now been read into the cache. How many downstream tasks will need to be re-executed?</p>

<p>To help us answer this question, the <code>Project</code> class supports a simple tracing facility that will tell us which tasks are executed in the course of a rebuild. Since the above change to <code>tutorial.txt</code> affects both its body and its title, everything downstream will need to be re-computed:</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; project.start_tracing()
&gt;&gt;&gt; project.rebuild()
&gt;&gt;&gt; <span class="dt">print</span>(project.stop_tracing())
calling parse(<span class="st">&#39;tutorial.txt&#39;</span>)
calling render(<span class="st">&#39;tutorial.txt&#39;</span>)
calling title_of(<span class="st">&#39;tutorial.txt&#39;</span>)
calling render(<span class="st">&#39;api.txt&#39;</span>)
calling render(<span class="st">&#39;index.txt&#39;</span>)</code></pre>

<p>Looking back at <a href="#figure-4.4">Figure 4.4</a>, you can see that, as expected, this is every task that is an immediate or downstream consequence of <code>read('tutorial.txt')</code>.</p>

<p>But what if we edit it again, but this time leave the title the same?</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; filesystem[<span class="st">&#39;tutorial.txt&#39;</span>] = <span class="st">&quot;&quot;&quot;</span>
<span class="st">... The Coder Tutorial</span>
<span class="st">... ------------------</span>
<span class="st">... Welcome to the coder tutorial!</span>
<span class="st">... It should be read top to bottom.</span>
<span class="st">... &quot;&quot;&quot;</span>
&gt;&gt;&gt; <span class="kw">with</span> project.cache_off():
...     text = read(<span class="st">&#39;tutorial.txt&#39;</span>)</code></pre>

<p>This small, limited change should have no effect on the other documents.</p>

<pre class="sourceCode python"><code class="sourceCode python">&gt;&gt;&gt; project.start_tracing()
&gt;&gt;&gt; project.rebuild()
&gt;&gt;&gt; <span class="dt">print</span>(project.stop_tracing())
calling parse(<span class="st">&#39;tutorial.txt&#39;</span>)
calling render(<span class="st">&#39;tutorial.txt&#39;</span>)
calling title_of(<span class="st">&#39;tutorial.txt&#39;</span>)</code></pre>

<p>Success! Only one document got rebuilt. The fact that <code>title_of()</code>, given a new input document, nevertheless returned the same value, means that all further downstream tasks were insulated from the change and did not get re-invoked.</p>

<h2 id="conclusion">Conclusion</h2>

<p>There exist languages and programming methodologies under which Contingent would be a suffocating forest of tiny classes, with verbose names given to every concept in the problem domain.</p>

<p>When programming Contingent in Python, however, we skipped the creation of a dozen possible classes like <code>TaskArgument</code> and <code>CachedResult</code> and <code>ConsequenceList</code>. We instead drew upon Python’s strong tradition of solving generic problems with generic data structures, resulting in code that repeatedly uses a small set of ideas from the core data structures tuple, list, set, and dict.</p>

<p>But does this not cause a problem?</p>

<p>Generic data structures are also, by their nature, anonymous. Our <code>project._cache</code> is a set. So is every collection of upstream and downstream nodes inside the <code>Graph</code>. Are we in danger of seeing generic <code>set</code> error messages and not knowing whether to look in the project or the graph implementation for the error?</p>

<p>In fact, we are not in danger!</p>

<p>Thanks to the careful discipline of encapsulation — of only allowing <code>Graph</code> code to touch the graph’s sets, and <code>Project</code> code to touch the project’s set — there will never be ambiguity if a set operation returns an error during a later phase of the project. The name of the innermost executing method at the moment of the error will necessarily direct us to exactly the class, and set, involved in the mistake. There is no need to create a subclass of <code>set</code> for every possible application of the data type, so long as we put that conventional underscore in front of data structure attributes and then are careful not to touch them from code outside of the class.</p>

<p>Contingent demonstrates how crucial the Facade pattern, from the epochal <em>Design Patterns</em> book, is for a well-designed Python program. Not every data structure and fragment of data in a Python program gets to be its own class. Instead, classes are used sparingly, at conceptual pivots in the code where a big idea — like the idea of a dependency graph — can be wrapped up into a Facade that hides the details of the simple generic data structures that lie beneath it.</p>

<p>Code outside of the Facade names the big concepts that it needs and the operations that it wants to perform. Inside of the Facade, the programmer manipulates the small and convenient moving parts of the Python programming language to make the operations happen.</p>
  </body>
</html>
